Effective Visualizations

Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.

Worksheet

The worksheet questions for today are embedded into the class notes.

You can download this Rmd file here

Note, there will be very little coding in-class today, but I’ve given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.

Resources

  1. Fundamentals of Data Visualization by Claus Wilke.

  2. Visualization Analysis and Design by Tamara Munzner.

  3. STAT545.com - Effective Graphics by Jenny Bryan.

  4. ggplot2 book by Hadley Wickam.

  5. Callingbull.org by Carl T. Bergstrom and Jevin West.

Part 1: Warm-up and pre-test [20 mins]

Warmup:

Write some notes here about what “effective visualizations” means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.

  1. They need good lables and headings
  2. should have units
  3. A legend if there are multiple lines
  4. They should have a description below of what the figure is

CQ01: Weekly hours for full-time employees

Question: Evaluate the strength of the claim based on the data: “German workers are more motivated and work more hours than workers in other EU nations.”

Very strong, strong, weak, very week, do not know

  • Weak because we don’t know about their motivation. There’s also no measure of uncertainty. Also how many people is this data from?

  • Main takeaway: Summarize the main takeaway from this question/discussion here

Gridlines obscure the point because we already have the numbers The axes are skewing the visualization THe graph isn’t really helpful because all the number of work hours are very similar

CQ02: Average Global Temperature by year

Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?

Yes, No, Do not know

  • Don’t know. It could be quite a large change depending on how many years it’s over. The value goes from about 57 to 59. 2 degrees of average warming is quite a lot. ALthough this is in farenheight which I don’t understand.

  • Main takeaway: Summarize the main takeaway from this question/discussion here

The axis starts at 0 but shouldn’t because it hides the variation in the data. It also would be useful to have the years

CQ03: Gun deaths in Florida

Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined.”

Very strong, strong, weak, very week, do not know

  • Week, the plot is the wrong way around.

  • Main takeaway: Summarize the main takeaway from this question/discussion here

It’s ridiculous to swap the axes and because it makes it look like gun deaths decrease after the law was enforced but that is not the case.

Part 2: Extracting insight from visualizations [20 mins]

Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.

Darkhorse analytics:

  1. remove backgrounds
  2. remove redundant labels
  3. remove borders
  4. reduce colours
  5. remove special effects
  6. remove bolding

Case Study 1: Context matters

The axes are very different

Case Study 2: A case for pie charts

Part 3: Principles of effective visualizations [20 mins]

We will be filling these principles in together as a class

Make a great plot worse

Instructions: Below is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart “bad” by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.

How many of the principles did you manage to break?

Plotly demo [10 mins]

Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it’s useful, and then you can try converting a static ggplot graph into an interactive plotly graph.

This is a preview of what we’ll be doing in STAT 547 - making dynamic and interactive dashboards using R!

## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p <- ggplot(gapminder, aes(x= gdpPercap, y = lifeExp, colour = continent))+
  geom_point()
p

p %>% 
  ggplotly()
gapminder %>% 
  plot_ly(x= ~gdpPercap,
          y= ~lifeExp, 
          color = ~continent, # need to make sure to type "color" and not "colour"
          
          type = 'scatter', 
          mode = 'markers')
# then set env
# then create api

Supplemental worksheet (Optional)

You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.